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As the strength of a stimulus increases, the proportions of correct binary responses 
increases, which define the psychonnetric function. Sinnultaneously, mean reaction times 
(RT) decrease, which collectively define the chronometric function. However, RTs are 
traditionally ignored when estimating psychophysical parameters, even though they 
may provide additional Shannon information. Here, we extend Palmer et al's (2005) 
proportional-rate diffusion model (PRD) by: (a) fitting individual RTs to an inverse Gaussian 
distribution, (b) including lapse rate, (c) point-of-subjective-equality (PSE) parameters, and, 
(d) using a two-alternative forced choice (2AFC) design based on the proportion of times a 
variable comparison stimulus is chosen. Maximum likelihood estimates of mean RT values 
(from fitted inverse Gaussians) and binary responses were fitted both separately and in 
combination to this extended PRD (EPRD) model, to obtain psychophysical parameter 
values. Values estimated from binary responses alone (i.e., the psychometric function) 
were found to be similar to those estimated from RTs alone (i.e., the chronometric 
function), which provides support for the underlying diffusion model. The EPRD model 
was then used to estimate the mutual information between binary responses and stimulus 
strength, and between RTs and stimulus strength. These provide conservative bounds for 
the average amount of Shannon information the observer gains about stimulus strength on 
each trial. For the human experiment reported here, the observer gains between 2.68 and 
3.55 bits/trial. These bounds are monotonically related to a new measure, the Shannon 
increment, which is the expected value of the smallest change in stimulus strength 
detectable by an observer. 

Keywords: psychometric function, chronometric function, point of subjective equality, diffusion model, reaction 
time, threshold. Shannon information, mutual information 



1. INTRODUCTION 

For over a 100 years, it has been known that the abihty to dis- 
criminate between two stimuli increases as a sigmoidal function 
of the difference between those stimuli, where this is tradition- 
ally measured using binary observer responses. However, when 
an observer makes a response, there is a trade-off between speed, 
or reaction time (RT), and accuracy of responses. This speed- 
accuracy trade-off has been the subject of numerous papers, 
notably (Ratcliff, 1978; Harvey, 1986; Swanson and Birch, 1992; 
Wichmann and Hill, 2001; Palmer et al, 2005), and more recently 
in Bonnet et al (2008). 

Here, we propose four extensions to the proportional-rate 
diffusion model (PRD) proposed in Palmer et al. (2005). First, 
we introduce a new parameter, the point-of-subjective-equality 
(PSE), which takes account of systematic shifts or bias in observer 
perception. This parameter is incorporated into the chronomet- 
ric and psychometric functions. Second, we use a maximum 
likelihood estimate (MLE) of the RT mean based on a phys- 
ically motivated diffusion model of RTs which involves fitting 
individual RTs to an inverse Gaussian distribution. Third, we 
take account of lapses in observer concentration by introducing 
a lapse rate parameter, which is estimated simultaneously with 



other psychophysical parameters. Fourth, we use a two-alternative 
forced choice (2AFC) design where the psychometric function 
is defined, not by the proportion of correct responses (range 
50-100%), but by the proportion of times a variable compari- 
son stimulus is chosen in preference to a fixed reference stimulus 
(range 0-100%). Note that the 2AFC experimental procedure is 
the same whether one chooses to measure the proportion of cor- 
rect responses or the proportion of times a variable comparison 
stimulus is chosen. 

Once the model has been fitted to these data, it can be used 
to estimate the mutual information (Shannon and Weaver, 1949; 
MacKay, 2003; Stone, 2014) between binary responses and stim- 
ulus strength, and between RT and stimulus strength. Finally, the 
mutual information provides a value for the Shannon increment, 
which is the expected value of the smallest change in stimulus 
strength detectable by an observer. 

2. THE PROPORTIONAL-RATE DIFFUSION MODEL 

We provide a brief summary of Palmer et al's PRD model (Palmer 
et al, 2005) here, and describe extensions below. In the experi- 
ment described in Palmer et al. (2005), an observer is presented 
with an array of moving dots. Stimulus strength x is defined by 
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coherence (i.e., the percentage of dots moving in the same direc- 
tion), and the observer is required to indicate which one of two 
directions the dots are moving. Note that coherence, and there- 
fore stimulus strength x, varies between zero and some upper 
bound. 

The PRD model is based on a diffusion model of RT, where the 
mean RT tprd varies as a sigmoidal function of x 



tPRD = — tanh(JiCAx) -|- tp 
Kx 



(1) 



where K is & measure of observer sensitivity, and A represents a 
decision boundary associated with RT. The first term on the right 
hand side represents the time to make a decision, and Ties is a fixed 
residual RT (e.g., time to respond after a decision is made). Notice 
that this model requires that the mean RT tprd decreases mono- 
tonically as the motion signal increases above zero, a requirement 
which will be relaxed in the model proposed below. 

Within the PRD model, the probability Pprd of making a cor- 
rect response is defined by the logistic psychometric function 



■PpRD 



1 



1 _|_ e-lAK\x\ ■ 



(2) 



where \x\ indicates the absolute value of x. In Equation (2), the 
product AJC acts as a single parameter which modulates the steep- 
ness of the sigmoidal function, and therefore acts as a measure of 
sensitivity to changes in stimulus strength. Note that the stim- 
ulus strength cannot fall below zero in Palmer et al's moving 
dot experiment, and that, when the stimulus motion strength is 
X = 0%, the observer has to guess, so that Prrd = 0.5, whereas if 
X = 100% then Prrd = 1.0. 

3. THE EXTENDED PROPORTIONAL-RATE DIFFUSION (EPRD) 
MODEL 

The model proposed here is based on the assumption that 
responses arise from a two -alternative forced choice (2AFC) pro- 
cedure. On each trial, the observer is presented with two stimuli, 
and the task is to choose the stronger stimulus, where strength can 
be defined in terms of differences in any physical quantity, such 
as speed, luminance, or contrast. The two stimuli are a reference 
stimulus with a stimulus value 5;^ that remains constant within a 
specific subset of trials, and a comparison stimulus with a value sc 
that varies between trials. A comparison response is obtained if the 
observer chooses the comparison stimulus. The stimulus strength 
X within one trial is defined as the difference between the reference 
value sr and the comparison value sc, specifically x = sc — sr. 

We measure performance in terms of the proportion P of 
times that a variable comparison stimulus is chosen in preference 
to the fixed reference stimulus, which we define as a compar- 
ison stimulus response, so P varies between zero and one. A 
direct translation from Prrd to P would guarantee that a stimu- 
lus strength of zero corresponds to P = 0.5. However, if observer 
perception is biased, such that a stimulus difference of x = 0 is 
not perceived as zero, then a stimulus strength of zero would not 
coincide with P = 0.5. This perceptual bias can be accommodated 
with a second modification, a new parameter srse, which is the 
point-of-subjective-equality (PSE) between the comparison and 



reference stimuli. Specifically, srse is the value sc of the compari- 
son stimulus which is perceived to be the same as the value sr of 
reference stimulus. 

Given that the stimulus strength is x = sc — sr, the perceived 
stimulus strength xf is 



X = Sc- SpSE 
= X — Ax, 



(3) 
(4) 



where Ax is the error in the perceived value of sc- The probability 
of choosing the comparison stimulus is defined as 



1 



l + e 



,-2AKx' 



(5) 



Note that the product AK effectively acts as a single parameter, 
and wiU be treated as such for binary response data (but not for 
RT data, see below). 

In order to take account of observer lapses in concentration, 
which result in a pure guess, we introduce a lapse rate parame- 
ter y. Evidence presented in Wichmann and Hill (2001) suggests 
that failure to take account of the lapse rate can lead to substantial 
errors in estimated psychophysical parameter values. If the lapse 
rate were zero then we would expect that P = 0 for highly negative 
stimulus strengths, and that P = 1 for highly positive stimulus 
strengths, so that observed deviations from P = 0 and P = 1 at 
extreme stimulus strengths can be used to provide an estimate of 
the lapse rate. Thus, the lapse rate parameter limits the lower and 
upper bounds of the psychometric function to Pjnin = y/2 and 



1 — Y/2, respectively, such that^ 



P = 



1 



l+e 



■0.5 



(l-y) -1-0.5. 



(6) 



Thus, the three parameters to be estimated for Equation (6) define 
the vector variable 



Op = {SKE,AK, y). 



(7) 



Similarly, we model the observer's mean RT for a perceived 
stimulus strength xf as 



t = — tanh{KAx') + Xy, 
he' 



(8) 



Here, the effects of A and K are separable, and so the four param- 
eters to be estimated for Equation (8) define the vector variable 



9-t = {SVS,^,A,K, tres)- 



(9) 



The lapse rate parameter is not included here because lapses have 
no predictable effect on RT. 

Finally, we can adapt results from Luce (1986) and Palmer et al. 
(2005) to relate RT to response probability. The mean decision 



'Notice that, if the lapse rate is y = 0.01 then the upper and lower bounds 
are 0.995 and 0.005, respectively, because half of the observer's guesses will be 
correct, on average. 
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time is defined as tjec = ^ j — ^res) so that Equations (5, 8) can be 
combined to provide a mapping between mean decision time Xdec 
and the probability P of choosing the comparison stimulus 



tde. 



(A/K) 



(IP- 1) 



(10) 



Thus, if the perceived stimulus strength y! has a large positive or 
negative value then P = 0 or P = 1 (respectively), and so tjec = 
Aj{K\x!\) in both cases. This predicts that, for a given perceived 
stimulus strength, the probability of choosing the comparison 
stimulus is proportional to the mean decision time. 

4. USING OBSERVER RESPONSES 

For each trial, we obtain a RT and a binary response from the 
observer, which indicates whether the observer has chosen the 
comparison stimulus or the reference stimulus. At each stimulus 
strength x,-, the comparison and reference stimuli are presented to 
the observer on N, trials, and the number of times the observer 
chooses the comparison and reference stimulus is recorded as «, 
and JV; — n,-, respectively. For a given putative value of P,-, a stan- 
dard binomial model gives the probability of the observed binary 
responses as 



p(«,iN„p,) = Q_. xp;- X (1-p,) 



(11) 



where P, is a function of the parameters Ak, y and PSE as defined 
in Equation (6). The maximum likelihood estimate of P; is the 
proportion of comparison stimulus responses P' = n,/Ni. 

When considered over all values of x, the probability of 
observing the set of all binary responses is defined by the log 
likelihood function 



vp , = N/i^(l — P'-). Results for the Gaussian approximation in 
Equation (14) were found to be very similar to those for Equation 
(13). Results reported here are based on Equation (13). 

5. USING REACTION TIMES 

RTs tend to be short if the comparison stimulus value is very dif- 
ferent from the reference stimulus, but as the comparison and 
reference stimuli become more similar, so the RT increases, as 
shown in Figure 4B. Here, we use RTs in a two stage process. First, 
a mean RT value is estimated at each stimulus strength. These 
mean RT values are then used as data for the RTi model, which is 
used to estimate EPRD model parameters. 

5.1. INVERSE GAUSSIAN MODEL OF INDIVIDUAL RTs 

It is commonly assumed that the RT is the time required for the 
cumulative amount of perceptual evidence to reach some crite- 
rion value (Ratcliff 1978; Smith, 1990). Specifically, this evidence 
accumulation is assumed to consist of a Brownian diffusion pro- 
cess with positive drift, which can be likened to a the total distance 
traveled in a one-dimensional biased random walk. If a Brownian 
process is allowed to run for a fixed time then it is well known 
that the final distribution of values (e.g., evidence) has a Gaussian 
distribution. However, it is less well known that if a Brownian 
diffusion process is allowed to run until it reaches a fixed crite- 
rion value then the time taken to reach that value has an inverse 
Gaussian or Wald distribution (see Figures). Therefore, if the 
amount of evidence required to make a response is stable for 
a given observer then RTs are appropriately modeled using an 
inverse Gaussian distribution^. 

If RTs have an inverse Gaussian distribution with mean x' then 
the probability of a single observed RT ty associated with the jth 
presentation of the stimulus value x,- is 



Lp= log nc;:;,^':'(i-p,)^--"' 



(12) 



= ^ «, logP, + ^(N, - «,) logd - P,) + ^ log Q., (13) 



i= 1 



j = 1 



i= 1 



p(x,j\x\, X,) 



2ir X 



1/2 

3 I ^ 6xp 



-X,(t,, - T ) 



'\2 



(15) 



where the variance of this distribution is 



where the final term does not depend on parameter values, 
and can be discarded unless the exact value of the likelihood is 
required. Recall that each P, is determined by Equation (6), which 
is a function of the EPRD parameter values 6p = {A, K, y, PSE). 
The maximum likelihood estimate (MLE) of 6p is obtained by 
finding EPRD parameter values Qp that maximize Lp. 

If the number of trials at each stimulus strength is large then 
Equation (13) can be approximated by a Gaussian function. At 
a given stimulus strength x,-, the observed proportion of binary 
responses is P', which is assumed to be the probability P; plus a 
noise term r|p, so that P- = Pi -|- Tip. If the noise tip has a Gaussian 
distribution with variance vp , then 



ySjrvp,; 



exp ■ 



2vpi 



(14) 



where P,- is defined as a function of A, k, x' in Equation (6), 
and the variances vpj can be estimated from the data as 



(16) 



Each of the stimulus strengths is presented AT,- times. For one 
model RT mean, the probability of the observed N; RTs (one RT 
per trial) defines the log likelihood function 



Ni 



(17) 



Maximizing Equation (17) with respect to the parameters t' and 
X; yields a maximum likelihood estimate (MLE) of both parame- 
ters at one stimulus strength x,-. Even though the algebraic mean 
and the MLE mean are identical (Tweedie, 1957) for the inverse 

^For reference, the Wald distribution is the distribution of first passage times 
of a biased Brownian process, and is qualitatively similar to the log-normal 
distribution, which is often used to model RT. 
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Gaussian, the fitting process provides the parameter estimate X,-, 
which is vital for subsequent calculations. 

5.2. MODEL RT-^: USING MEAN REACTION TIMES 

For a given stimulus strength x,, the predicted mean RT t,- varies 
as a tanh function of x,, as defined in Equation (8). The central 
limit theorem allows us to assume that the distribution of mean 
RTs of the inverse Gaussian pdf at a given stimulus strength x,- is 
Gaussian with mean t' and variance v:^ j. Therefore, the likelihood 
of the EPRD mean t, from Equation (8) is 

The variance of an inverse Gaussian distribution of RT values with 
mean x'^ is v-j, (Equation 16), so the variance v-^i of a distribution 
of means (where each mean is based on samples) is 



Thus, we can assess the fit of the inverse Gaussian mean RTs t' to 
the EPRD mean RTs t, of Equation (8) as follows. The probabil- 
ity of the mean RTs x' (one mean RT per stimulus strength) 
defines the log likelihood function 

L-x = log n P(%\'^'^ (20) 

i= 1 

= _l/2^ij ^-l/2^1og27rv^,„ (21) 

, = 1 ,=1 

where t,- is defined in Equation (8), so that the parameters to 
be estimated for model RTi are 6t = (A, k, y, PSE, tres) to fit the 
overall variation in mean RT with stimulus strength x. 

In summary, we have three estimates of the mean RT at each 
stimulus strength: the algebraic mean x'^-^^^, the MLE mean of the 
inverse Gaussian or Wald pdf x' (from Equation 17), which collec- 
tively are used as data to estimate the means t,- (one per stimulus 
strength) obtained from the fitted EPRD model (from Equation 
21). The MLE means x!^ are shown as crosses in Figure 4B, 
and the means t, are corresponding points on the fitted curve, 
respectively. 

We also have two estimates of the probability of a comparison 
stimulus response at each stimulus strength: the observed propor- 
tion of comparison stimulus responses (which is the MLE P'- = 
ni/Ni), and the mean P,- (one per stimulus strength) obtained 
from fitting the EPRD model (Equation 13) to the MLE means 
P'-. These are shown as dots in Figure 4A, and as corresponding 
points on the fitted curve, respectively. 

6. USING BINARY RESPONSES AND RTs 

In the absence of knowledge regarding the covariance between the 
noise in mean RT and binary response probability, we are forced 
to assume this covariance is zero. In other words, we assume that 
Lp and Lx provide independent estimates of the EPRD model 



parameters. In this case, estimates based on combined RT and 
binary response probability are obtained by maximizing the sum 
of likelihoods 

Lc = Lp + Li. (22) 

However, the implausibQity of this independence assumption 
means that we will not take seriously any results based on 
Equation (22). 

7. INFORMATION THEORY 

The amount of Shannon information (Shannon and Weaver, 
1949; MacKay, 2003; Stone, 2014) that the observer gains about 
the stimulus is reflected in both the binary responses and RTs. 
Specifically, the average Shannon information that each mean 
RT provides about the stimulus strength x is the mutual infor- 
mation I{x, x) between x and the mean RT. Similarly, the 
average Shannon information that binary responses provide 
about the stimulus strength x is the mutual information I{x, P) 
between x and the probability of a comparison stimulus binary 
response. 

More importantly, the total amount of Shannon information 
that the observer has about the stimulus cannot be less than 
the amount of Shannon information implicit in the observer's 
combined binary and RT responses. In other words, the total 
mutual information, as measured by an experimenter, between 
observer responses and stimulus strength provides a lower bound 
for the amount of Shannon information that the observer has 
about the stimulus strength. Thus, each the mutual information 
value provided in this paper constitutes a conservative estimate 
of the amount of information that the observer gains about the 
stimulus. 

7.1. EVALUATING /(X, P) 

The mutual information I(x, P) between stimulus strength s 
and the probability P that the observer chooses the comparison 
stimulus (i.e., r = 1) is 

lix, P) = f I p{x, P) log ^^^^ dP dx (23) 

Jx Jp P(x)p(P) 

= H{x) + H{P) - H{x, P) bits, (24) 

where H(x) and H(P) are the differential entropies of p{x) and 
p(P), respectively, and H(x, P) is the differential entropy of the 
joint distribution p(x, P). All logarithms in this paper use base 
2, so information is measured in bits. Substituting p(x, P) = 
p(P\x)p{x), yields 

I(x,P) = [ p(x) [ p(P\x)\og^^dPdx (25) 

Jx Jp P(P) 

= H(P) - H(P\x) bits, (26) 

where H(P\x) is the differential entropy of the noise in the mea- 
surements P. Given Bayes' rule, p(P\x) = p(x\P)p(P)/p{x), we 
can recognize the mutual information as the differential entropy 
H(P) of the prior distribution minus the differential entropy 
H(P\x) of the posterior distribution. 
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We can evaluate Equation (25) by summing over discrete ver- 
sions of the variables x and P. Recall that the observed proportion 
of responses r = 1 at a given stimulus strength x,- is _P- = ni/N,, so 
that 



I(x,P) = Y^pixk) 



p{P',\Xk) 

p(p'i) 



bits. (27) 



We assume that the probability of stimulus values is locally uni- 
form, so thatp(x(-) = l/Nj^. In order to evaluate Equation (27), 
we require expressions for p(P'^\xk) and piP'^)- 

7. 1. 1. Evaluating the posterior p(P'j \ Xj^) 

Using Equation (5) across a range of x values, the fitted value of 
P at xjt is Pj;. Assuming a binomial distribution, the probability of 
the observed proportion _P. given a fitted value P^ at x^ is 



p{P',\xk) 



Pk) 



Ni-tii 



(28) 



where p(P^ = piP'^Pk), andp(_P'|xjt) values are normalized to 
ensure that '^ip{P'j\xk) = 1- 

7.1.2. Evaluating tlie prior p(P'i) 

The distribution of binary responses is binomial with a mean 
equal to the grand mean Pq of all Ng binary responses of an 
observer 



Pg 



, Ng 

-T 



Ng ' 



(29) 



1= 1 



where r, = 1 if and only if a response corresponds to the observer 
choosing the comparison stimulus. The observer's prior prob- 
ability of the binary responses for the ith stimulus strength is 
therefore 

p{P',) = C"^^P"c^(l-PGf--"-, (30) 
where p(i'-) values are normalized to ensure that ^ip{P'^ = 1. 
7.2. EVALUATING /(x, ^) 

Following the same line of reasoning as above, the mutual infor- 
mation I{x, x) between stimulus strength and mean RT is 



I{x, x) 



j p(x) jp(x\x)\og^^^^^ dz dx 



P(r) 



H(t) -H(t|x)bits, 



(31) 
(32) 



where H(x\x) is the differential entropy of the noise in the 
measurements t. 

We can evaluate Equation (31) by summing over discrete 
versions of the variables x and x 



I(x, t) = ^ p{Xk) 



k=l 



Ni 



J^PCi'Mk) log 



Pi'i'.lxk) 



bits, (33) 



where p(x' \xk) is defined by the EPRD model (Equation 8) with a 
fitted value Xk, so that 



p(x',\Xk) = p(x',\Xk(Qr)), 



(34) 



as in Equation (18). As before, we assume that the probability of 
stimulus values is uniform, so that p{xk) = l/AT,-. 

7.2.1. Evaluating tlie posterior p{x'j\Xk) 

The posterior is defined in Equation (18), but is repeated here 
with changed subscripts for clarity 



P('i',\xk) 



1 



■ exp 



-(t; - xkY 

2vxk 



(35) 



where vik is defined in Equation (19), and p(x'\xk) values are 
normalized to ensure that 'Y2iP(%\xk) = 1- 

7.2.2. Evaluating the prior p(t^ ) 

A parametric form for the observer's prior probability distribu- 
tion p(x) of individual RTs was estimated from the entire set of 
that observer's grand total of Ng RTs. These were fitted to an 
inverse Gaussian distribution to obtain a grand mean xg and a 
parameter Xg- This pdf has a variance 



(36) 



At each stimulus strength x,, the RT mean is based on a sam- 
ple of Ni RTs, and the central limit theorem suggests that the 
distribution of means is approximately Gaussian with a variance 



vg/N, 



(37) 



Therefore, the prior probability density of each inverse Gaussian 



mean t ■ is 



P(%) 



^2nx 



■ exp 



-(t^ - xcf 
2v„ 



(38) 



where p(x'-) values are normalized to ensure that ^ip(x'j) = 1. 

7.3. THE SHANNON INFORMATION OF A SINGLE RESPONSE 

So far we have derived expressions for the Shannon informa- 
tion implicit in the average RT t,- and also in the average binary 
response, which is summarized as the proportion P, of compar- 
ison responses, for a stimulus strength x,. Here, we derive an 
expression for the Shannon information associated with a single 
trial; first for RTs, and then for binary responses. 

As the number of trials at each stimulus strength is increased, 
so the variance in each mean RT decreases, and the central limit 
theorem ensures that the distribution of means becomes increas- 
ingly Gaussian. The mutual information between two variables 
(e.g., mean RT and stimulus strength) depends on the signal to 
noise ratio SNR 



I < 1/2 log2(l-|-SNR), 



(39) 



where SNR is the signal variance expressed as a fraction of the 
noise variance in the measurement (Shannon and Weaver, 1949). 
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If the distribution of mean RTs is Gaussian then the distribu- 
tion of differences At between mean RT x and the grand mean 
RT (at one stimulus strength) must also be Gaussian. Because 
the mutual information is defined in Equation (32) to be the 
differential entropy of x minus the differential entropy of the 
noise Ax in t, we can assume equality in Equation (39) (Rieke 
et al, 1997). In fact, we do not need to rely on the central limit 
theorem here, because even if the perturbing noise At is not 
Gaussian, Shannon's Theorem 18 (Shannon and Weaver, 1949) 
implies equality in Equation (39), so that 



I = 1/2 logjCl -|-SNR)bits. 



(40) 



We already have a value for the mutual information I(x, x) from 
Equation (27), so we can re-arrange Equation (40) to find the 
SNR associated with x 



SNR^ = 22^*^'^> - 1 bits. 



(41) 



However, the mutual information I{x, x) obtained from Equation 
(27) tells us how much average Shannon information each mean 
RT provides about stimulus strength, whereas we want to know 
how much average information each individual RT provides 
about stimulus strength. Because the value of SNR in Equation 
(41) is based on mean RTs, each of which involves N, trials, the 
variance of the measurement noise has been reduced by a factor 
of Ni relative to the noise in the RT of a single trial (provided this 
noise is iid). This implies that the value of SNR for a single trial is 



SNRt = SNR^/N, 

= (22^(^'^) - 1)/N,bits. 



(42) 
(43) 



If we substitute SNRt into Equation (40) then we obtain an esti- 
mate of the average Shannon information I{x, x) implicit in the 
observer's RT in a single trial 



I{x, t:) = - log2 



H- 



(22/(xr 



1) 



bits. 



(44) 



A similar line of reasoning implies that the average Shannon 
information I(x, r) implicit in the observer's binary response r in 
a single trial is 



I(x, r) 



1 



l0g2 



1 -I- 



P) 



1) 



bits. 



(45) 



In order to compare mutual information estimates for the differ- 
ent variables t and r, the calculations for I{x, x) and I{x, r) should 
be based on the same range of stimulus strengths x. 

7.4. DEFINING THE SHANNON INCREMENT 

The mutual information between stimulus strength and (binary 
or RT) responses can be used to define the smallest average 
detectable difference in stimulus strength, which we call the 
Shannon increment (SI). We first define the effective stimulus 
range x^ange as the range of stimulus strengths x associated with 



response probabilities between P = € and P = 1 — €, for some 
small value 6. Then the SI is related to the mutual information 
/by 



SI : 



^range 



2' 



(46) 



where the value 2 is based on the assumption that information 
is measured in bits (i.e., using log to the base 2), and SI has the 
same units as stimulus strength. Because SI decreases monoton- 
ically with mutual information, it should become asymptotically 
closer to the true value of SI as the number of trials or stimulus 
strengths is increased. 

A brief explanation for this definition is as follows. Consider 
a range of stimulus strengths Xj-ange which give rise to "noisy" 
observer responses y = f(x), where these responses are samples 
from a probability density functionp(y(x)), and where the mutual 
information between x and y is I bits. One way to interpret SI 
involves assuming that p{y(x)) is uniform. In this case, on aver- 
age, knowing the value of y reduces the possible range of x values 
to an interval Ax = Xrange/Z^, which we can recognize as being 
equal to the SI. 

8. FAT-FACE THIN: A DEMONSTRATION EXPERIMENT 

We used the EPRD models described above to estimate the PSE 
and other key parameters for a simple demonstration experiment 
using a human observer. On each trial, the observer was presented 
with a colored picture of an upright face and an inverted face 
(see Figure 2) on a computer screen, and was required to indicate 
which one appeared to be wider by pressing a left/right computer 
key. For half of the trials, the reference stimulus was an upright 
face, and the comparison stimulus was an inverted version of the 
same face, and these were swapped for the other half of the tri- 
als. The width of the comparison image was determined by 1 of 
21 stretch factors 5 = 0.90, 0.91, . . . , 1.10, but the height of both 
stimuli was kept constant. The stimulus strength was defined to 
be X = s — 1, so that x varied between —0.1 and 0.1. For a given 
value of Si, the observer was presented with the same stimulus pair 
for a total of N, = 20 trials. Stimuli were shown in random order, 
and the left/right position of reference/comparison stimuli was 
counterbalanced across trials. 

8.1. RESULTS 

Each of three models defined by Lp, Li, and Lc was used to 
fit a psychometric and/or a chronometric function to the data 
from one subject, as shown in Figure 4. Maximum likelihood 
parameter estimation was implemented in MatLab using the 
Nelder-Mead simplex method. The parameter estimates for each 
model are summarized in Table 1. 

8.2. USING BINARY RESPONSES: MODEL Lp 

Based on 420 binary responses, maximizing Lp (Equation 12) 
yields a psychometric function similar to that in Figure 4A, 
and a PSE of spsE = 1.031. This maximum likelihood esti- 
mate implies that an inverted face must be 3.1% wider than 
an upright face in order for the two faces to be perceived as 
the same width. Numerical estimation of the Hessian matrix 
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FIGURE 1 1 How the entropy H(x) in stimulus strength x is accounted 
for by the entropy H(x) in RT (x) and entropy H(P) in the probability P 
of a particular binary response r. Jhe entropies of x. P, and x are 

represented by the discs X, V, and Z, respectively. Tlie mutual information 
between x and P is /(x, P) = (a + b), and the mutual information between 
X and X is l(x, x) = (a+ c). 




FIGURE 2 I Schematic illustration of typical stimulus shown to 
observer on a single trial. The observer has to choose the face that looks 
wider. The stimulus in the experiment used was a picture of the actor 
James Corden's face, with all background details removed (see 
http://illusionoftheyear.com/2010/the-fat-face-thin-fft-illusion). 



of second derivatives of Equation (12) at spsE yields a stan- 
dard error (se) of 0.003, which implies that spse is significantly 
different from s = 1 (p < 0.001). The values of three param- 
eters were estimated for this model, the PSE, Ak, and y, and 
the product Ak is quoted in Table 1 for comparison with other 
works. 

8.3. USING MEAN REACTION TIMES: MODEL U 

Each of 21 mean RTs (one per stimulus strength) was first esti- 
mated by maximizing Equation (17), based on 20 RTs per stim- 
ulus strength. Using these 21 mean RTs, Li (Equation 21), was 
maximized with respect to four parameters (PSE, A, k, and Xyss) 
to yield a chronometric function similar to that in Figure 4B. The 
estimated PSE is spsE = 1-034 (se = 0.004, p < 0.001). 

8.4. USING MEAN RTs AND OBSERVER RESPONSES: MODEL Lc 

Based on 42 data points (the 21 estimated mean RTs used for Li 
plus 21 corresponding binary response probabilities used for Lp), 




2 3 4 
RT (second) 

FIGURE 3 I Reaction times fitted with an inverse Gaussian 

(Equation 15). Each dot represents 1 of 20 RTs for a stimulus value (width 

scaling) of s = 1 .05. 



maximizing Lc (Equation 22) yields the psychometric function 
and the chronometric function in Figures 4A,B, respectively, and 
a PSE of 1.032 (se = 0.003, p < 0.001). There are five parameters 
to be estimated for this model, the PSE, A, k, tres, and y- 

8.5. SHANNON INFORMATION 

The mutual information I{x, x) between x and x is the entropy 
in p{x) and p(x) shared by the joint distribution p{x, x). 
Using Equation (33), this evaluates to I{x, x) = 2.79 bits. Using 
Equation (44) with N, = 20, this implies that the mutual infor- 
mation I{x, x) for a single RT is I{x, x) = 0.87 bits, and is repre- 
sented by the intersection of regions X and Z. 

Similarly, Equation (27) can be used to estimate the mutual 
information between x and P, which comes to I(x, P) = 4.82 bits. 
Using Equation (45) with N, = 20, this implies that the mutual 
information I{x, r) for a single binary response r is I{x, r) = 
2.68 bits, and is represented by the intersection of regions 
X and Y. 

We can use I{x, x) and I(x, r) to provide lower and upper 
bounds on the total amount of mutual information /tot between x 
and the combined variables (r, t), which can be considered to be a 
vector variable. If x and r provide independent information about 
X (i.e., if (3 = 0 in Figure 1) then the maximum value of /tot is 



max(/tot) = I{x, x) + I{x, r) 
= 0.87 2.68 
= 3.55 bits. 



(47) 
(48) 
(49) 



However, if all of the information I{x, x) provided by x about x 
is the same as part of the information provided by r about x (i.e., 
if c = 0 in Figure 1) then /tot cannot be less than I{x, r). To take 
account of the possibility that all of the information I(x, r) pro- 
vided by r about x is the same as part of the information provided 
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FIGURE 4 I The psychometric function (A) and chronometric function 
(B), from the face inversion experiment for one observer. TIte width 
scaling factor s applied to the comparison image is indicated on the 
abscissa. The vertical dashed line marks the point-of-subjective-equality 
(PSE) at s= 1.032. (A) Each dot represents the observed proportion of 
trials for which the observer chose the comparison stimulus, and the 
fitted psychometric function is defined in Equation 6. (B) Each dot 
represents the RT of a single trial for the same responses as in 



Figure 4A (RTs greater than 2s are not shown). The fitted chronometric 
function is defined in Equation 8. The dashed curve joins the fitted 
(inverse Gaussian) mean RTs, each of which was obtained by maximizing 
Equation 17. The solid curves in (A, B) (Equations 6, 8, respectively) 
were fitted using combined binary and mean RT data by maximizing 
Equation 22. A graph similar to (A) was obtained for model Lp (i.e., 
using only binary responses), and a graph similar to (B) was obtained for 
model L\ (i.e., using only mean RTs). 



Table 1 | Results for three models. 

Model PSE A k AxK i^cs (s) y LLik Ml (bits) 

Binary /.p 1.031 ±0.003 NA NA 22.32 NA 0.005 -31.13 2.68 

RJ U 1.034 ±0.004 0.998 28.37 28.32 0.437 NA 18.7 0.87 

Comb Lc 1.032 ±0.003 1.016 23.12 23.50 0.354 0.011 -13.10 3.18 

Binary model: based only on binary response probability (Equation 12). 
RT model: based only on mean RT (Equation 17). 

Comb (combined model): based on binary response probability and mean RT (Equation 22). 

PSE, point of subjective equality (± indicates standard error): A and k are EPRD parameters, xres Is the fixed part of RT; y, lapse rate; LLIk, log likelihood: and 
Ml, mutual information between stimulus strength and RT or binary responses or both (see text). The final number (3.18 bits) represents l(x, r) = 2.68 plus 
l(x, x) = 0.497, computed using parameter values obtained from Equation 22. 



by T about x, we can write 

min(7tot) = max(7(x, t), I(x, r)) (50) 
= max(0.87, 2.68) (51) 
= 2.68 bits. (52) 

Thus, on average, each trial provides the observer with between 
2.68 and 3.55 bits. 

8.6. SHANNON INCREMENT 

Using a conservative estimate of mutual information of I =2.68 
bits suggests that the observer can discriminate differences 
between the reference and comparison stimulus with an aver- 
age resolution of about one part in 6.39 (= 2^'*^) of the effective 
range Xrange of stimulus strengths. Note that the range of scaling 
values used Snnge = 0.2 (i.e., 0.9 ... 1.1) equals the range of stim- 
ulus strengths x^nge = 0.2 (i.e., —0.1 . . .0.1). Therefore, the SI 
for the width scaling factor is 



SI = X„nge/2^ (53) 
= 0.2/6.39 (54) 
= 0.031, (55) 

where we have assumed t = 0 here. Thus, on average, the smallest 
change in scaling factor (between reference and comparison 
stimulus) detectable by the observer is SI = 0.031. 

9. DISCUSSION 

We have shown how the PRD model from Palmer et al. (2005) 
can be extended to make use of individual RTs, which can be 
combined with binary observer responses to estimate key psy- 
chophysical parameters in a 2AFC design. 

A key feature of diffusion-based models is that they treat 
each RT as the end-point of an accumulation of evidence. If 
we take this type of evidence-accumulation process seriously 
then it makes sense to model the distribution of RT values 
as an inverse Gaussian distribution (for reasons described in 
section 5). 
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A striking result is the difference between the log likelihoods 
associated with the binary response model and the RT model, 
despite the fact that the binary response model has fewer free 
parameters than the RT model, and that both models provide 
similar PSE estimates which (based on their sems, not shown) are 
not significantly different. These log likelihood values suggest that 
the EPRD model provides a better fit to the RT data than it does 
to the binary response data. This difference in likelihoods suggests 
that the parameter estimates obtained using the combined RT and 
response data is dominated by the binary data likelihood term. 

Self-evidently, both the RT and binary responses of an observer 
depend on the stimulus strength x. However, in general, it is 
not known if RT or binary response data provide more Shannon 
information about the value of x. More importantly, and more 
subtley, it is not known if they provide the same information 
about X, or if they merely provide the same amount of information 
about X (see Figure 1). 

We can gain some insight into the nature of this problem by 
considering the proportion of the differential entropy in stimu- 
lus values accounted for by the corresponding differential entropy 
in observer responses. At one extreme, if an observer is told 
to respond as quickly as possible then the RTs should pro- 
vide relatively large amounts of mutual information regarding 
stimulus strength, whereas the binary responses carry relatively 
little mutual information (because speeded responses tend to 
be inaccurate Hanks et al., 2011). In this case, the RT entropy 
at a given stimulus strength will be relatively small, because 
RTs will be tightly coupled to the stimulus strength, whereas 
the binary response entropy at a given stimulus strength will 
be relatively large (because these responses are inaccurate, and 
therefore not tightly coupled to the stimulus strength). However, 
when considered across different stimulus strengths, the tight 
coupling between RT and stimulus strength will give rise to a 
relatively large RT entropy, and most of this entropy will be 
shared with stimulus strength entropy (which defines a large 
mutual information between RT and stimulus strength). In con- 
trast, these fast, inaccurate responses across stimulus strengths 
will be associated with a relatively small range of response prob- 
ability values (e.g., P ~ 0.5), which wLU therefore have a rel- 
atively small entropy, most of which is not shared with the 
stimulus strength entropy (which defines a small mutual infor- 
mation between binary responses and stimulus strength). In 
summary, fast responses should yield high entropy RT values, 
which share a large proportion of their entropy with the stim- 
ulus strength, combined with low entropy P values which share 
a small proportion of their entropy with the stimulus strength. 
At the other extreme, if an observer is told to be as accu- 
rate as possible then this should yield high entropy P values 
which share a large proportion of their entropy with the stimu- 
lus strength, combined with low entropy RT values which share 
a small proportion of their entropy with the stimulus strength. 
In summary, the entropy in stimulus strength can be shared 
with entropy in both accuracy (P) and speed (RT). However, as 
there is probably only a finite amount of such shared entropy 
(mutual information) available, we predict that it can be real- 
ized experimentally as maximum speed or maximum accuracy, 
but not both. 



The scenario considered above can be represented geomet- 
rically, as in Figure 1 . If we compare the mutual information 
between x and x with the mutual information between r and 
X then it is possible that they have the same magnitude [e.g., 
{a + c) = (a + b), as in Figure 1]. However, the fact that both 
X and X have the same amount of mutual information (i.e., they 
account for the same amount of entropy in x) does not imply that 
they account for the same entropy in x. Formally, the fact that 
(a + c) = {a + b) does not imply that {a + c) = {a + b). This 
matters because, even if I{x, x) = I{x, r), we could not conclude 
that I{x, x) = I(x, r), and so we could not conclude that x and r 
provide mutually redundant information. Thus, we cannot dis- 
miss X simply because r accounts for more entropy in x than x 
does (or vice versa). Indeed, this is precisely the situation that we 
have in the results reported here, and provides reasonable grounds 
for making use of both RT and binary response data in general. 

Unfortunately, we have been unable to derive an expression 
for the total mutual information between the joint variables (RT 
and binary responses) and stimulus strength /(t, P; x') (i.e., the 
area [a + h + c]in Figure 1 ) , although it may be possible to do so 
using Equation ( 10) [where the entropy of the difference between 
P and X is H(x, P|x')]- The precise effect of the instructions given 
to observers on mutual information, and the proposed invari- 
ance of the total mutual information with respect to instructions, 
clearly require further research (Soukoreff and MacKenzie, 2009). 

The Shannon increment (SI) is similar in spirit to the more 
conventional just noticeable difference (IND). However, the JND 
has an arbitrary value, and (despite its name) there is no reason to 
suppose that a JND is indeed just noticeable. The SI is monoton- 
ically related to the average amount of Shannon information an 
observer gains regarding a single presentation of a stimulus, and 
is a measure of the perceptual resolution with which a parameter 
is represented by the observer. 

10. CONCLUSION 

We have presented an extended proportional-rate diffusion 
model, which takes account of both individual RTs and binary 
responses for maximum likelihood estimation of key psychophys- 
ical parameters (e.g., PSE, slope) of the psychometric and chrono- 
metric functions. The fact that these psychophysical parameters 
have similar estimated values when computed independently for 
two models based on RTs alone or on binary responses alone pro- 
vides support for the underlying physical basis of this class of 
diffusion models. 

An information-theoretic analysis was used to estimate the 
average amount of Shannon information that each RT pro- 
vided about the stimulus value, and also the average amount of 
Shannon information that each binary response provided about 
the stimulus value. This analysis provides bounds for the average 
amount of Shannon information that the observer gains about 
the stimulus value from one presentation, which was found to be 
between 2.68 and 3.55 bits/trial for the experiment used here. 
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APPENDIX 

MATHEMATICAL SYMBOLS AND ABBREVIATIONS 

A an EPRD model parameter which is the amount of evidence 
required to trigger a response. 

comparison stimulus response: a response indicating the compar- 
ison stimulus was chosen. 

EPRD: extended proportional-rate diffusion model. 

SI: Shannon increment, the smallest detectable change in a 

stimulus. 

Y EPRD lapse rate parameter. 

i index over stimulus strength x, with range i = I, . . . , Nx- 
j index over trials at one stimulus strength x,, with range 
l,...,N, 

k index over stimulus strength, with range k = 1, . . . , N^. 

_K" is a measure of sensitivity to changes in x in the EPRD model. 

Ni number of trials at stimulus strength x;. 

Nx number of different stimulus strengths. 

PSE: point of subjective equality. 

Pi proportion of comparison stimulus responses at stimulus 
strength x„ predicted by EPRD model. 

_P' MLE mean, equal to observed proportion of comparison 
responses at stimulus strength x,. 



r binary observer response (e.g., observer chooses comparison 
or reference stimulus). 

5c variable stimulus value of the comparison stimulus. 
sr fixed stimulus value of the reference stimulus. 
spsE value of the comparison stimulus which the observer per- 
ceives as being the same as the reference stimulus. 

MLE mean of inverse Gaussian RT at stimulus 
strength x,. 

Xj mean RT at stimulus strength x,-, as predicted by EPRD 
model. 

Tdeci mean decision RT at stimulus strength x;, as predicted by 
EPRD model. 

Tres mean residual RT (assumed the same at all stim- 
ulus strengths), as predicted by EPRD model, where 

"^res — "^dec.i T^z- 

6t = (spsE, A, K, y. Ties), five parameters for the RT component 
of the EPRD model. 

9p = (spsE, AfC, y), three parameters for the binary response 
component of the EPRD model. 
Vt,, variance in mean RT. 
Xi stimulus strength. 

x'- perceived strength of stimulus with strength x,. 
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